Inducing Sense-Discriminating Context Patterns from Sense-Tagged Corpora
نویسندگان
چکیده
Traditionally, context features used in word sense disambiguation are based on collocation statistics and use only minimal syntactic and semantic information. Corpus Pattern Analysis is a technique for producing knowledge-rich context features that capture sense distinctions. It involves (1) identifying sense-carrying context patterns and (2) using the derived context features to discriminate between the unseen instances. Both stages require manual seeding. In this paper, we show how to automate inducing sense-discriminating context features from a sense-tagged corpus.
منابع مشابه
Discriminating Among Word Meanings by Identifying Similar Contexts
Word sense discrimination is an unsupervised clustering problem, which seeks to discover which instances of a word/s are used in the same meaning. This is done strictly based on information found in raw corpora, without using any sense tagged text or other existing knowledge sources. Our particular focus is to systematically compare the efficacy of a range of lexical features, context represent...
متن کاملAutomatic Acquisition of Sense Tagged Corpora
An important problem in Natural Language Processing is identifying thecorrect sense of a word in a particular context. Thus far, statistical methods have been considered the best techniques in word sense disambiguation. Unfortunately, these methods produce high accuracy results only for a small number of preselected words. The reduced applicability of statistical methods is due basically to the...
متن کاملBootstrapping Large Sense Tagged Corpora
The performance of Word Sense Disambiguation systems largely depends on the availability of sense tagged corpora. Since the semantic annotations are usually done by humans, the size of such corpora is limited to a handful of tagged texts. This paper proposes a generation algorithm that may be used to automatically create large sense tagged corpora. The approach is evaluated through comparative ...
متن کاملWord Sense Disambiguation Using Random Indexing
This paper presents the results of an experiment to apply a novel semantic representational formalism called Random Indexing for the supervised word sense disambiguation of English words. Random Indexing uses high-dimensional sparse vectors with random patterns modeling neural activation patterns in the brain to represent linguistic information. The presented learning and disambiguating method ...
متن کاملTowards Automatic Acquisition of a Fully Sense Tagged Corpus for Persian
Sense tagged corpora play a crucial role in Natural Language Processing, particularly in Word Sense Disambiguation and Natural Language Understanding. Since semantic annotations are usually performed by humans, such corpora are limited to a handful of tagged texts and are not available for many languages with scarce resources including Persian. The shortage of efficient, reliable linguistic res...
متن کامل